Assignment 1 Recap

To recap, in Assignment 1, I (1) analysed information about the dataset I have chosen, and its associated publication, (2) mapped the expression data to HUGO gene symbols, (3) filtered out unncecessary genes, and (4) normalized the dataset. The following subsections go over these points in a bit more detail, and are taken from my Assignment 1. In addition, the code from my .rmd file includes code from A1- in order to regenerate the normalized counts matrix.

Brief Introduction to the Data

The expression dataset that I had chosen is GSE221253, taken from the GEO expression data repository. The study associated with this dataset involved adoptive cell therapy. Adoptive cell therapy via tumor infiltrating lymphocytes (ACT-TIL) is a type of immunotherapy used to treat cancer. This study took samples from 13 patients having metastatic melanoma and performed RNAseq analysis, along with other analyses like spatial proteomics and scRNAseq, on tumor tissues before and after the cell therapy (pre- and post-ACT) in order to gain a better understanding of the interactions and cell states within the tumor microenvironment throughout treatment. For the RNAseq experiment, this resulted in a total of 26 samples, and 13 in each condition (two for each patient, and the experimental conditions being pre- and post-ACT treatment).

Mapping to HUGO Gene Symbols

The first step from Assignment 1 was to map the expression data to HUGO gene symbols. Only 940 out of 19117 genes from our original gene expression raw file were unable to be mapped to HUGO gene symbols. This is actually not that bad!

Dataset Filteration and Normalization

The next step in Assignment 1 was to filter out the genes with low counts and normalize our data. Filteration of outlier genes with low gene counts removed ~3678 genes from the expression dataset. Then, TMM normalization was performed to normalize the data. The plots (pre and post normalization + filtration) are shown below:

Plot of the Data Before Normalization and Filtration
Plot of the Data Before Normalization and Filtration
Plot of the Data After Normalization and Filtration
Plot of the Data After Normalization and Filtration

Assignment 2 Recap

To recap, the first step of Assignment 2 is to perform differential gene expression, in order to identify key genes that are differentially expressed between our subgroups. This involves (1) constructing our model design, (2) estimating the dispersion, (3) performing multiple hypothesis testing and using the Quasi likelihood model to calculate differential expression, and (4) generate a heatmap and MA plot of the data. Next, we performed threshold over-representation analysis

Model Design

First, this involved defining our model design, and inspecting the MDS plot (below) to see if there was any clustering between our two groups (pre-ACT and post-ACT). It doesn’t seem that there is a lot of clustering amongst the pre-ACT and post-ACT samples. However, the study associated with our data set aims to understand interactions and cell states within the tumor microenvironment throughout treatment, so it makes the most sense to define the group for our model design as pre- and post-ACT treatment.

Estimate Dispersion + Top Gene Hits

Here, we estimate the dispersion, fit the model to our specified model design, and then calculate differential expression using the Quasi liklihood model. The table below shows the top gene hits.

logFC logCPM F PValue FDR
ALB -14.893974 11.900720 40.50206 7.0e-07 0.0018536
FOSB 3.791452 5.261109 39.29955 9.0e-07 0.0018536
FOS 2.600458 7.410897 38.90495 1.0e-06 0.0018536
FGB -13.907270 10.372034 38.81796 1.1e-06 0.0018536
FGA -13.295751 9.650812 38.26330 1.2e-06 0.0018536
ZC2HC1B -2.871639 1.604275 37.93189 1.3e-06 0.0018536
GC -11.892857 7.806030 37.19659 1.5e-06 0.0018536
APOH -11.785979 7.789941 37.07206 1.5e-06 0.0018536
KNG1 -11.937946 8.299449 37.02790 1.5e-06 0.0018536
ARG1 -8.562373 5.402457 36.46001 1.8e-06 0.0018536
x
1 BH
x
1 samplesInfoMatrix$timepre-ACT
x
1 glm

Multiple Hypothesis Testing

In the below code blocks, we compile all of the results, and find how many genes pass the threshold value, and how many genes pass correction.

How many genes pass the p-value threshold p < 0.05?

## [1] 4684

Now, let us see how many genes pass correction for multiple hypothesis testing. ~25% of the genes pass correction. This is a decent number, as it is not too much or not too little genes.

## [1] 1643

MA Plot

Here is an MA plot that displays the differential gene expression between our two groups of samples: pre- and post- ACT.

Heat Map

The following codeblock generates a heatmap of our data, along with annotations, to see how the data seems to cluster. Based on what is shown in the heatmap below, there does seem to be some minimal clustering of genes that are overexpressed in some pre-ACT samples.

Thresholded Over Representation Analysis

The last step of this assignment is to perform thresholded over-representation analysis (ORA). Essentially, our main goal here is to see whether our upregulated or downregulated genes all share some kind of common characteristic (maybe some of them are from the same pathway, etc). By doing this, we can then make conclusions about what kind of genes are overexpressed/underexpressed and ensure that it aligns with literature.

Define All, Upregulated, Downregulated genes

Here, we define our upregulated and downregulated genes. Upregulated genes are defined here to have a p-value of 0.05 and positive fold change. Downregulated genes are defined here to have a p-value of 0.05 and negative fold change.

## [1] "Number of upregulated genes"
## [1] 1170
## [1] "Number of downregulated genes"
## [1] 473
## [1] "Number of significant genes in total"
## [1] 1643

Running G:profiler on All Significant Genes

Here, we run run g:profiler on the set of all significant genes that have a p-value of < 0.05 to perform a gene set enrichment analysis.

query significant p_value term_size query_size intersection_size precision recall term_id source term_name effective_domain_size source_order parents
query_1 TRUE 6.630947e-07 1494 1393 199 0.14285714 0.1331995 GO:0044281 GO:BP small molecule metabolic process 16178 11317 GO:0008152
query_1 TRUE 2.189967e-05 2145 1393 257 0.18449390 0.1198135 GO:0009056 GO:BP catabolic process 16178 3269 GO:0008152
query_1 TRUE 5.598736e-05 4941 1393 517 0.37114142 0.1046347 GO:1901564 GO:BP organonitrogen compound metabolic process 16178 22098 GO:00068….
query_1 TRUE 7.868727e-05 267 1393 51 0.03661163 0.1910112 GO:0016053 GO:BP organic acid biosynthetic process 16178 5145 GO:00060….
query_1 TRUE 9.388388e-05 764 1393 109 0.07824838 0.1426702 GO:0006082 GO:BP organic acid metabolic process 16178 1967 GO:00442….
query_1 TRUE 9.388388e-05 264 1393 50 0.03589375 0.1893939 GO:0046394 GO:BP carboxylic acid biosynthetic process 16178 12522 GO:00160….

Running G:profiler on the Upregulated Gene Set

In the codeblock below, we run g:profiler on the set of upregulated genes to perform a gene set enrichment analysis.

query significant p_value term_size query_size intersection_size precision recall term_id source term_name effective_domain_size source_order parents
query_1 TRUE 0.004049154 88 997 18 0.018054162 0.20454545 GO:0042775 GO:BP mitochondrial ATP synthesis coupled electron transport 16178 10709 GO:0042773
query_1 TRUE 0.004049154 4941 997 368 0.369107322 0.07447885 GO:1901564 GO:BP organonitrogen compound metabolic process 16178 22098 GO:00068….
query_1 TRUE 0.004049154 88 997 18 0.018054162 0.20454545 GO:0042773 GO:BP ATP synthesis coupled electron transport 16178 10707 GO:00061….
query_1 TRUE 0.004049154 8726 997 605 0.606820461 0.06933303 GO:0044238 GO:BP primary metabolic process 16178 11300 GO:0008152
query_1 TRUE 0.004049154 428 997 51 0.051153460 0.11915888 GO:0007005 GO:BP mitochondrion organization 16178 2648 GO:0006996
query_1 TRUE 0.004049154 6 997 5 0.005015045 0.83333333 GO:0036115 GO:BP fatty-acyl-CoA catabolic process 16178 9776 GO:00091….
query_1 TRUE 0.004049154 46 997 13 0.013039117 0.28260870 GO:0006120 GO:BP mitochondrial electron transport, NADH to ubiquinone 16178 1998 GO:00196….
query_1 TRUE 0.004049154 160 997 26 0.026078235 0.16250000 GO:0009060 GO:BP aerobic respiration 16178 3273 GO:0045333
query_1 TRUE 0.004244812 8309 997 579 0.580742227 0.06968348 GO:0006807 GO:BP nitrogen compound metabolic process 16178 2516 GO:0008152
query_1 TRUE 0.004266470 9757 997 666 0.668004012 0.06825869 GO:0008152 GO:BP metabolic process 16178 3159 GO:0008150
query_1 TRUE 0.004353541 99 997 19 0.019057172 0.19191919 GO:0022904 GO:BP respiratory electron transport chain 16178 6868 GO:00229….
query_1 TRUE 0.004460309 186 997 28 0.028084253 0.15053763 GO:0045333 GO:BP cellular respiration 16178 11755 GO:0015980
query_1 TRUE 0.004460309 490 997 55 0.055165496 0.11224490 GO:0061919 GO:BP process utilizing autophagic mechanism 16178 16215 GO:0009987
query_1 TRUE 0.004460309 490 997 55 0.055165496 0.11224490 GO:0006914 GO:BP autophagy 16178 2592 GO:00442….
query_1 TRUE 0.004460309 59 997 14 0.014042126 0.23728814 GO:0072332 GO:BP intrinsic apoptotic signaling pathway by p53 class mediator 16178 17869 GO:00723….
query_1 TRUE 0.005219212 61 997 14 0.014042126 0.22950820 GO:0010257 GO:BP NADH dehydrogenase complex assembly 16178 4072 GO:0065003
query_1 TRUE 0.005219212 61 997 14 0.014042126 0.22950820 GO:0032981 GO:BP mitochondrial respiratory chain complex I assembly 16178 8243 GO:00102….
query_1 TRUE 0.005971288 1168 997 107 0.107321966 0.09160959 GO:0009057 GO:BP macromolecule catabolic process 16178 3270 GO:00431….
query_1 TRUE 0.007634004 2145 997 176 0.176529589 0.08205128 GO:0009056 GO:BP catabolic process 16178 3269 GO:0008152
query_1 TRUE 0.007634004 107 997 19 0.019057172 0.17757009 GO:0022900 GO:BP electron transport chain 16178 6867 GO:0006091

Running G:profiler on the Downregulated Gene Set

Here, we do the same for the set of downregulated genes, and perform a gene set enrichment analysis on them.

query significant p_value term_size query_size intersection_size precision recall term_id source term_name effective_domain_size source_order parents
query_1 TRUE 1.545313e-14 1494 396 94 0.23737374 0.06291834 GO:0044281 GO:BP small molecule metabolic process 16178 11317 GO:0008152
query_1 TRUE 2.470354e-14 764 396 63 0.15909091 0.08246073 GO:0006082 GO:BP organic acid metabolic process 16178 1967 GO:00442….
query_1 TRUE 1.688262e-13 758 396 61 0.15404040 0.08047493 GO:0043436 GO:BP oxoacid metabolic process 16178 11048 GO:0006082
query_1 TRUE 1.688262e-13 739 396 60 0.15151515 0.08119080 GO:0019752 GO:BP carboxylic acid metabolic process 16178 6252 GO:0043436
query_1 TRUE 5.379899e-11 299 396 34 0.08585859 0.11371237 GO:0044282 GO:BP small molecule catabolic process 16178 11318 GO:00090….
query_1 TRUE 5.379899e-11 24 396 12 0.03030303 0.50000000 GO:0042730 GO:BP fibrinolysis 16178 10679 GO:0030195

Gene Sets Returned from Downregulated and Upregulated Analyses

Here, we present the number of gene sets returned from both the downregulated and upregulated analyses:

## [1] "How many genesets were returned from downregulated analysis?"
## [1] 5057
## [1] "How many genesets were returned from upregulated analysis"
## [1] 7844

Interpretation

The authors in the original paper concluded that in samples after Adoptive Cell Therapy, there was an increase in the strength and number of tumor infiltrating lymphocytes (TILSs), as well as an increase in interferon-activated myeloid T-cells (Barras 2024). This suggests that there seems to be an overall improved immune response towards the tumor after cell therapy. Upon looking at the over-representation results for down-regulated and up-regulated genes, however, there does not seem to be much correlation between the papers findings and the over-representation results. The overrepresentation results list common metabolic process such as “small molecule metabolic process” and “mitochondrial ATP synthesis coupled electron transport” as top hits, but these are not indicative of anything as they are very general/broad processes.

Assignment 3

In this section, we begin assignment 3. The first step is to perform non-threshold Gene set Enrichment Analysis. Next, we will visualize our GSEA using Cytoscape. Lastly, we will interpret and have a more granular understanding of our results.

Non-thresholded Gene set Enrichment Analysis

There are a lot of existing gene set analysis algorithm that are non-thresholded, but we will be using GSEA, as it is very popular, and well trusted throughout literature (the number of citations it has tops all other methods). We will also be using gene sets from the BaderLab.

#define path to the shell script to run the command line GSEA on
gsea_jar <- "GSEA_4.3.3/gsea-cli.sh"

# was previously set to true, but once we already have all of our results, we dont need this to be true anymore. 
run_gsea <- TRUE

#name of our analysis
analysis_name <- "Pre_vs_Post_ACT"

rnk_file <- "Pre_vs_post_ACT_edger_ranks.rnk"

# obtain the rank file to use in GSEA analysis
  #first sort our original gene hit list (from A2 recap) by computing the rank from the rank formula 


qlf_output_hits_rnk <- data.frame(
    gene = rownames(qlf_output_hits$table),
    rank = -log10(qlf_output_hits$table$PValue) * sign(qlf_output_hits$table$logFC)
)

#order our rank file based on decreasing rank value
qlf_output_hits_rnk <- qlf_output_hits_rnk[order(qlf_output_hits_rnk$rank, decreasing = TRUE), ]
file_path <- "Pre_vs_post_ACT_edger_ranks.rnk"

#write the necessary rank fie out to a .rnk file, if it does not exist
if(!file.exists(file_path)){
  write.table(qlf_output_hits_rnk, file = file_path, sep = "\t", quote = FALSE, row.names = FALSE, col.names = TRUE)
}

Here, we acquire the BaderLab dataset files, and first check to see if it exists.

# Here, we get all the necessary gene sets from the BaderLab
# Now we need to get the gene sets from the Bader Lab
gmt_url = "http://download.baderlab.org/EM_Genesets/current_release/Human/symbol/"

# list all the files on server
filenames = getURL(gmt_url)
tc = textConnection(filenames)
contents = readLines(tc)
close(tc)

# get gmt with all the GO pathways, and DOES NOT have terms from electronic annotations. 
# for the purposes of this analysis, we will start with the gmt file that has only pathways

rx <- gregexpr("(?<=<a href=\")(.*.GOBP_AllPathways_noPFOCR_no_GO_iea.*.)(.gmt)(?=\">)",
        contents,
        perl = TRUE
    )

gmt_file <- unlist(regmatches(contents, rx))


if(!file.exists(gmt_file)){
    download.file(
      paste(gmt_url,gmt_file,sep=""),
      destfile=gmt_file
    )
}

Here, we have set run_gsea to FALSE, as this command was already run in a previous iteration, and all the relevant files were already added to the current directory.

# here, we will set up the command for GSEA and run it (if GSEA == true)

run_gsea <- FALSE
if(run_gsea){
  command <- paste("",gsea_jar,  
                   "GSEAPreRanked -gmx", gmt_file, 
                   "-rnk" ,rnk_file, 
                   "-collapse false -nperm 1000 -scoring_scheme weighted", 
                   "-rpt_label ",analysis_name,
                   "  -plot_top_x 20 -rnd_seed 12345  -set_max 200",  
                   " -set_min 15 -zip_report false ",
                   " -out" , getwd(), 
                   " > gsea_output.txt",sep=" ")
  system(command)
}

#at this point, we have completed running GSEA, and we have moved the relevant files into the working directory!

In this code block we quickly visualize our up-regulated pathways (postive normalize enrichment score- NES) and downregulated pathways (neg normalized enrichment score- NES). The up- regulated and down- regulated pathways are relative to out Post-ACT treatment group.

downregulated_pathways <- read.delim("gsea_report_for_na_neg_1712026640651.tsv")
head(downregulated_pathways)
##                                                                                  NAME
## 1                                NEGATIVE REGULATION OF WOUND HEALING%GOBP%GO:0061045
## 2                                         TERPENOID METABOLIC PROCESS%GOBP%GO:0006721
## 3 CYTOCHROME P450 - ARRANGED BY SUBSTRATE TYPE%REACTOME DATABASE ID RELEASE 81%211897
## 4                                       DITERPENOID METABOLIC PROCESS%GOBP%GO:0016101
## 5                                   NEGATIVE REGULATION OF HEMOSTASIS%GOBP%GO:1900047
## 6                            NEGATIVE REGULATION OF BLOOD COAGULATION%GOBP%GO:0030195
##                                                          GS.br..follow.link.to.MSigDB
## 1                                NEGATIVE REGULATION OF WOUND HEALING%GOBP%GO:0061045
## 2                                         TERPENOID METABOLIC PROCESS%GOBP%GO:0006721
## 3 CYTOCHROME P450 - ARRANGED BY SUBSTRATE TYPE%REACTOME DATABASE ID RELEASE 81%211897
## 4                                       DITERPENOID METABOLIC PROCESS%GOBP%GO:0016101
## 5                                   NEGATIVE REGULATION OF HEMOSTASIS%GOBP%GO:1900047
## 6                            NEGATIVE REGULATION OF BLOOD COAGULATION%GOBP%GO:0030195
##    GS.DETAILS SIZE         ES       NES NOM.p.val FDR.q.val FWER.p.val
## 1 Details ...   51 -0.7474599 -2.933088         0         0          0
## 2 Details ...   58 -0.7330024 -2.871940         0         0          0
## 3 Details ...   48 -0.7501460 -2.867836         0         0          0
## 4 Details ...   52 -0.7482500 -2.864391         0         0          0
## 5 Details ...   36 -0.8118737 -2.861843         0         0          0
## 6 Details ...   36 -0.8118737 -2.858984         0         0          0
##   RANK.AT.MAX                   LEADING.EDGE  X
## 1         638  tags=29%, list=4%, signal=31% NA
## 2        1295  tags=45%, list=8%, signal=49% NA
## 3        1610 tags=50%, list=10%, signal=56% NA
## 4        1295  tags=46%, list=8%, signal=50% NA
## 5         599  tags=39%, list=4%, signal=40% NA
## 6         599  tags=39%, list=4%, signal=40% NA
upregulated_pathways <- read.delim("gsea_report_for_na_pos_1712026640651.tsv")
head(upregulated_pathways)
##                                                                     NAME
## 1                              OXIDATIVE PHOSPHORYLATION%GOBP%GO:0006119
## 2 MITOCHONDRIAL ATP SYNTHESIS COUPLED ELECTRON TRANSPORT%GOBP%GO:0042775
## 3                       AEROBIC ELECTRON TRANSPORT CHAIN%GOBP%GO:0019646
## 4                                    AEROBIC RESPIRATION%GOBP%GO:0009060
## 5                   AEROBIC RESPIRATION I (CYTOCHROME C)%BIOCYC%PWY-3781
## 6               ATP SYNTHESIS COUPLED ELECTRON TRANSPORT%GOBP%GO:0042773
##                                             GS.br..follow.link.to.MSigDB
## 1                              OXIDATIVE PHOSPHORYLATION%GOBP%GO:0006119
## 2 MITOCHONDRIAL ATP SYNTHESIS COUPLED ELECTRON TRANSPORT%GOBP%GO:0042775
## 3                       AEROBIC ELECTRON TRANSPORT CHAIN%GOBP%GO:0019646
## 4                                    AEROBIC RESPIRATION%GOBP%GO:0009060
## 5                   AEROBIC RESPIRATION I (CYTOCHROME C)%BIOCYC%PWY-3781
## 6               ATP SYNTHESIS COUPLED ELECTRON TRANSPORT%GOBP%GO:0042773
##    GS.DETAILS SIZE        ES      NES NOM.p.val    FDR.q.val FWER.p.val
## 1 Details ...   73 0.6527362 2.173621         0 0.0000000000      0.000
## 2 Details ...   66 0.6572567 2.165148         0 0.0000000000      0.000
## 3 Details ...   65 0.6504819 2.151371         0 0.0000000000      0.000
## 4 Details ...  102 0.6214411 2.134509         0 0.0002267553      0.001
## 5 Details ...   62 0.6590620 2.132997         0 0.0001814043      0.001
## 6 Details ...   66 0.6572567 2.132463         0 0.0001511702      0.001
##   RANK.AT.MAX                   LEADING.EDGE  X
## 1        3704 tags=71%, list=24%, signal=93% NA
## 2        3127 tags=68%, list=20%, signal=85% NA
## 3        3127 tags=66%, list=20%, signal=83% NA
## 4        3704 tags=64%, list=24%, signal=83% NA
## 5        3704 tags=76%, list=24%, signal=99% NA
## 6        3127 tags=68%, list=20%, signal=85% NA

Non-Threshold GSEA- Interpretation Questions/Answers

1.What method did you use? What genesets did you use? Make sure to specify versions and cite your methods. There are a lot of existing gene set analysis algorithm that are non-thresholded, but I decided to use GSEA, as it is very popular, and well trusted throughout literature (the number of citations it has tops all other methods). I also used gene sets from the BaderLab, which can be found here. To perform the GSEA, I followed the methods outlined here

2.Summarize your enrichment results. Upon opening the index file outputted by GSEA, we can see that:

4263 / 5903 gene sets are upregulated in phenotype Post-Adoptive Cell Therapy Treatment, 1049 gene sets are significant at FDR < 25%, 445 gene sets are significantly enriched at nominal pvalue < 1% 944 gene sets are significantly enriched at nominal pvalue < 5%,

1640 / 5903 gene sets are upregulated in phenotype Pre-Adoptive Cell Therapy Treatment, 682 gene sets are significantly enriched at FDR < 25%, 441 gene sets are significantly enriched at nominal pvalue < 1%, 605 gene sets are significantly enriched at nominal pvalue < 5%

3. How do these results compare to the results from the thresholded analysis in Assignment #2. Compare qualitatively. Is this a straight forward comparison? Why or why not? From the statistics in question 2, we see that, more genes are up-regulated than down-regulated in the Post-ACT groups. This is also what we observed from our g:profiler results- the number of up-regulated genes was significantly more than the number of down-regulated genes. So in this regard, the comparison is straigntforward when comparing both the thresholded and non-thresholded analysis results. However, when performing thresholded over-representation analysis in Assignment 2, we first had to filter out the genes based on their p-values, to ensure that we only had significant up-regulated and down-regulated genes, resulting in an overall lower number of genes than GSEA. For GSEA, we end up using all of the genes, regardless of their p-value, to ensure that we dont loose any signal. So in this regard, it is not really a straightforward comparison due to the differing methods.

Visualizing GSEA in Cytoscape

Our next step is to use our results from non-thresholded GSEA and visualize our results in Cytoscape. In these figures presented, a node represents a gene set, and an edge represents genes in common between two gene sets. Red nodes represent gene sets specifically up-regulated in the post-ACT group, and blue nodes represents gene sets specifically down-regulated in the post-ACT group (or up-regulated in the pre-ACT group).

Visualize GSEA in Cytoscape- Interpretation Questions/Answers

1.Create an enrichment map - how many nodes and how many edges in the resulting map? What thresholds were used to create this map? Make sure to record all thresholds. Include a screenshot of your network prior to manual layout. Enrichment map is created below (with pathway annotations). There are a total of 323 nodes and 2352 edges. To generate by map, I used a node cutoff threshold of Q-value = 0.01, and for my edge cutoff, I used a threshold value of Q-value = 0.375. I used default parameters for the rest of the fields. Enrichment Map: Pre-ACT vs. Post-ACT The figure above shows the enrichment map representing gene sets for two groups: pre-ACT patients and post-ACT patients. Nodes in blue represent gene sets that are down-regulated in post-ACT patients. Nodes in red represent gene sets that are up-regulated in post-ACT patients.

2.Annotate your network - what parameters did you use to annotate the network. If you are using the default parameters make sure to list them as well. For my annotated network (figures below), I decided to annotate each gene set with its pathway name. I also decided to ensure that there are not any overlaps in cluster names, to ensure a clear image and no overlapping words. I also added a figure legend to both the images below.

Annotated Network (first set of clusters)
Annotated Network (first set of clusters)
Annotated Network (second set of clusters)
Annotated Network (second set of clusters)

3.Make a publication ready figure - include this figure with proper legends in your notebook. My publication ready figures, each with a figure legend are presented in question 2.

4.Collapse your network to a theme network. What are the major themes present in this analysis? Do they fit with the model? Are there any novel pathways or themes? My images above from question 2, are collapsed into “themes”/clusters. From the images presented in question 2, we see that there is a lot of pathways associated with processes: “ximelagatran action pathway”, “valdecoxcib action pathway”, “mediated fceri activation”, and “respiratory electron transport”. There are other themes present, but these are just the largest ones. There is not really a clear indication that these themes fit with my model- the authors from the original paper concluded that in samples post-ACT, there was an increase in the strength and number of tumor infiltrating lymphocytes (TILSs), as well as an increase in interferon-activated myeloid T-cells (Barras 2024), and from the themes it seems that valdecoxib action pathways, which treat inflammation and pain are down-regulated in the post-ACT group, whereas pathways related to fceri activation (which cause the release of inflammatory mediators such as histamines) were up-regulated in the post-ACT group. I am not able to find a reason for why there would be increased inflammation in post-ACT patients. I hypothesize that it might have something to do with inflammatory responses activated due to the cell therapy. Also, from my understanding, there are no novel pathways or themes present.

Interpretation and detailed view of results

1.Do the enrichment results support conclusions or mechanism discussed in the original paper? How do these results differ from the results you got from Assignment #2 thresholded methods I was not able to find any type of correlation between the conclusions discussed in the original paper and the enrichment results from GSEA. As I mentioned above, the original paper concluded that in samples post Adoptive Cell Therapy (ACT), there was an increase in the strength and number of tumor infiltrating lymphocytes (TILSs), as well as an increase in interferon-activated myeloid T-cells (Barras 2024). This means that post-ACT treatment, the bodies immune system has become stronger. The enrichment map shows a predominant cluster of pathways that are down regulated in post-ACT patients under the theme “valdecoxib action pathways”, and upon doing some literature search, I found that valdecoxib is essentially a drug that prevents inflammation and pain, but was not sure why it would be down-regulated in post-ACT patients. In addition, one of the other major themes that were up-regulated in post-ACT patients, present in the enrichment map was “fceri mediated activation”, which caused the release of inflammatory mediators such as histamine. I was also not able to find any literature as to why these pathways would be upregulated in post-ACT patients.

In regards to comparing these results with the results from assignment 2’s thresholded g:profiler methods, there was not much of a correlation. Common pathways presented in A2 were apoptotic pathways and molecular metabolic processes, neither of which really correlate with the big themes presented in the GSEA enrichment map.

2.Can you find evidence, i.e. publications, to support some of the results that you see. How does this evidence support your result? As I mentioned above, fceri mediated activation was a major theme presented in my enrichment map above. However, I was not able to find much evidence as to why these pathways were up-regulated in post-ACT patients.

Pathway/Theme to Investigate More Closely

The pathway that I chose to investigate in more detail was the FCERI MEDIATED NF-KB ACTIVATION pathway. I decided to choose this pathway as it is found in one of the most prevalent themes in our network above, and is up-regulated in post-ACT patients. Thus, I wanted to investigate it in a bit more detail, to see if I can find any information and insights as to why pathways that mediate release of inflammatory molecules are up-regulated. Below, I used STRING to generate the pathway as a gene network. I was unable to, however annotate the network or pathway with my log fold expression values and p-values, because every time I tried to import my rank file, and select gene names as the parameter, Cytoscape wouldn’t generate the correct output. I have explained this issue in my journal as well.

STRING Pathway Up Close
STRING Pathway Up Close

References

Morgan M, Ramos M (2023). BiocManager: Access the Bioconductor Project Package Repository. R package version 1.30.22, https://CRAN.R-project.org/package=BiocManager.

Davis, S. and Meltzer, P. S. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 2007, 14, 1846-1847

Xie Y (2023). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.45, https://yihui.org/knitr/.

Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140

Response to tumor-infiltrating lymphocyte adoptive therapy is associated with preexisting CD8+ T-myeloid cell … (n.d.). https://www.science.org/doi/10.1126/sciimmunol.adg7995

Gu Z, Eils R, Schlesner M (2016). “Complex heatmaps reveal patterns and correlations in multidimensional genomic data.” Bioinformatics. doi:10.1093/bioinformatics/btw313

Gu Z (2022). “Complex Heatmap Visualization.” iMeta. doi:10.1002/imt2.43.

Kolberg L, Raudvere U, Kuzmin I, Vilo J, Peterson H (2020). “gprofiler2– an R package for gene list functional enrichment analysis and namespace conversion toolset
g:Profiler.” F1000Research, 9 (ELIXIR)(709). R package version 0.2.3.

Seidel, C. S. C. (n.d.). Intro to Edger. https://research.stowers.org/cws/CompGenomics/Projects/edgeR.html

Shi D, Jiang P. A Different Facet of p53 Function: Regulation of Immunity and Inflammation During Tumor Development. Front Cell Dev Biol. 2021 Oct 18;9:762651. doi:10.3389/fcell.2021.762651. PMID: 34733856; PMCID: PMC8558413.

Agrawal a, et al. (2024) WikiPathways 2024: next generation pathway database. NAR.

Milacic M, Beavers D, Conley P, Gong C, Gillespie M, Griss J, Haw R, Jassal B, Matthews L, May B, Petryszak R, Ragueneau E, Rothfels K, Sevilla C, Shamovsky V, Stephan R, Tiwari K, Varusai T, Weiser J, Wright A, Wu G, Stein L, Hermjakob H, D’Eustachio P. The Reactome Pathway Knowledgebase 2024. Nucleic Acids Research. 2024. doi: 10.1093/nar/gkad1025.

Ashburner et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000 May;25(1):25-9. DOI: 10.1038/75556

Cytoscape App Store - enrichmentmap. (n.d.). https://apps.cytoscape.org/apps/enrichmentmap

Gary Bader, R. I. (n.d.). Pathway and network analysis of -OMICS data ( June 2023 ). Module 3 Lab: GSEA Visualization.https://baderlab.github.io/CBW_Pathways_2023/gsea_mod3.html

“Valdecoxib (Oral Route) Description and Brand Names.” Mayo Clinic, Mayo Foundation for Medical Education and Research, 1 Feb. 2024, www.mayoclinic.org/drugs-supplements/valdecoxib-oral-route/description/drg-20066654.

“FC Epsilon Receptor (FCERI) Signaling.” National Center for Biotechnology Information. PubChem Compound Database, U.S. National Library of Medicine, pubchem.ncbi.nlm.nih.gov/pathway/Reactome:R-HSA-2454202.Accessed 3 Apr. 2024.